healthcare

Introduction

This is the webpage for our final project. Using what we’ve learned throughout the semester, we created various visualizations all while using techniques such as joins via Tableau and Shiny. The Shiny application visualizations can be found on Jack’s Shiny account which you can navigate to by clicking this link: https://jackchang.shinyapps.io/finproj1/

R Configuration

Below is our R session info.

sessionInfo(package=NULL)
## R version 3.3.2 (2016-10-31)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 14393)
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] backports_1.0.5 magrittr_1.5    rprojroot_1.2   tools_3.3.2    
##  [5] htmltools_0.3.5 yaml_2.1.14     Rcpp_0.12.10    stringi_1.1.2  
##  [9] rmarkdown_1.3   knitr_1.15.1    stringr_1.1.0   digest_0.6.11  
## [13] evaluate_0.10

Data Clean-up

This code generates a post-ETL csv for our healthcare data into our CSV folder. You can find it in our S17 DV Final Project dataset on data.world and download it by clicking the download button one the healthcare.csv file!

source("../01 Data/ETL_healthcare.R")
## Loading required package: readr
## Warning: package 'readr' was built under R version 3.3.3
## Loading required package: plyr
## Loading required package: jsonlite
## Warning: package 'jsonlite' was built under R version 3.3.3
## Loading required package: RCurl
## Loading required package: bitops
## [1] "Here are the columns of the CSV we are dealing with..."
## Parsed with column specification:
## cols(
##   `DRG Definition` = col_character(),
##   `Provider Id` = col_integer(),
##   `Provider Name` = col_character(),
##   `Provider Street Address` = col_character(),
##   `Provider City` = col_character(),
##   `Provider State` = col_character(),
##   `Provider Zip Code` = col_integer(),
##   `Hospital Referral Region Description` = col_character(),
##   `Total Discharges` = col_integer(),
##   `Average Covered Charges` = col_character(),
##   `Average Total Payments` = col_character(),
##   `Average Medicare Payments` = col_character()
## )
## [1] "Done! Check your CSV Folder."

Joining

We joined our healthcare.csv with the census data keyed on states using Tableau. We wanted to see how many young males and females are insured in each of these states, and could do so by looking at our joined data. We also joined this merged dataset to a dataset named Poverty, which has information on number of people in and out of poverty in each state. The following png shows the join in Tableau. We did the same join using SQL for our Shiny App.

Graph 1: Crosstab

This first visual is a crosstab which joins all 3 of our datasets (which are related to healchare) throughout the US. The crosstab shows the cost per discharge in the 5 least insured states. It includes different sets to organize the data as well as a KPI of cost per discharge with three different parameters: high, medium, and low.

Graph 2: Barchart with ID Sets

This bargraph selectively shows the sum of the average total payments per year of the different heart disease procedures from 3 most populous states and the 3 least populous states. We can use the state and Diagnosis Related Group (DRG) filters to adjust the Bargraph as we like while also using ID sets to create different kinds of sets from the heart procedures.

Graph 3: Treemap/Histogram

This visual is a treemap which shows the total annual number of discharges per state KPI. This treemap helps understand what to expect from different facets of our healthcare data, including which states should insure the most/least people, which states should have the highest payments, and much more. Ultimately this visual serves to give a more general understanding of such a large collection of data.

Here is a histogram from shiny showing the same concept:

Graph 4: Barchart with Table Calculation

This bargraph joins data from 2 of our datasets, USA_All_Healthcare.csv and healthcare.csv to show the percentage of females insured per state versus the percentage of males insured per state. More specifically, rather than showing the actual percentage of each gender, the visual compares each state’s female and male insured percentage to the national average of each gender’s insured percentage.

Graph 5: Scatterplot

This scatter joins all 3 of our datasets: Poverty.csv, USA_All_States.csv, and healthcare.csv and shows the how the percentage of poverty as a function of percentage of people insured per state. Here we see a general trend that explains that as percentage of people insured per state increases, the percentage of poverty decreases. In other words, these two variables are negatively correlated.

Graph 6: Map

This map joins Poverty.csv and healthcare.csv. A KPI was created which creates a ratio of the number of youth (18-24) below the poverty line over the total number of people below poverty line. When certain states are chosen, an action will open up a bar chart which will compare that KPI between different states.

Graph 7: BoxPlot

This boxplot shows the average and quartile information for average medicare payment in each state. The color is keyed by insurance rate and the sorted from left to right. This is interesting in that it looks like the states with the highest median payment also tend to have the highest insurance rates.